Filtering and Clustering XML Retrieval Results

نویسندگان

  • Jaap Kamps
  • Marijn Koolen
  • Börkur Sigurbjörnsson
چکیده

As part of the INEX 2006 Adhoc Track, we conducted a range of experiments with filtering and clustering XML element retrieval results. Our basic retrieval engine retrieves arbitrary elements from the collection (corresponding to the Thorough Task). These runs are filtered to remove textual overlap between elements (corresponding to the Focused Task). The resulting runs can be clustered per article (corresponding to the All in Context Task). Finally, we select the “best” element for each article (corresponding to the Best in Context Task). Our main findings are the following. First, a complete element index outperforms a restricted index based on section-structure, albeit the differences are small. Second, grouping non-overlapping elements per article does not lead to performance degradation, but may improve scores. Third, all restrictions of the “pure” element runs (by removing overlap, by grouping elements per article, or by selecting a single element per article) lead to some but only moderate loss of precision.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

خوشه‌بندی فراابتکاری اسناد فارسی اِکس‌اِم‌اِل مبتنی بر شباهت ساختاری و محتوایی

Due to the increasing number of documents, XML, effectively organize these documents in order to retrieve useful information from them is essential. A possible solution is performed on the clustering of XML documents in order to discover knowledge. Clustering XML documents is a key issue of how to measure the similarity between XML documents. Conventional clustering of text documents using a do...

متن کامل

Prototyping a Vibrato-Aware Query-By-Humming (QBH) Music Information Retrieval System for Mobile Communication Devices: Case of Chromatic Harmonica

Background and Aim: The current research aims at prototyping query-by-humming music information retrieval systems for smart phones. Methods: This multi-method research follows simulation technique from mixed models of the operations research methodology, and the documentary research method, simultaneously. Two chromatic harmonica albums comprised the research population. To achieve the purpose ...

متن کامل

XML Filtering Using Dynamic Hierarchical Clustering of User Profiles

Information filtering systems constitute a critical component in modern information seeking applications. As the number of users grows and the information available becomes even bigger it is crucial to employ scalable and efficient representation and filtering techniques. In this paper we propose an innovative XML filtering system that utilizes clustering of user profiles in order to reduce the...

متن کامل

Bitmap Indexing-based Clustering and Retrieval of XML Documents

This paper describes a bitmap indexing based technique to cluster XML documents. XML documents can be hierarchically represented by elements. To improve performance of information retrieval, documents can be indexed using bitmap techniques. Such a bitmap index is sparse, meaning it contains unnecessarily many zero bits, especially for the word dimension. To remove zero bits and improve the perf...

متن کامل

Project-Join-Repair: An Approach to Consistent Query Answering Under Functional Dependencies

Algebra-based identification of tree patterns in XQuery p. 13 Approximate querying of XML fuzzy data p. 26 Relaxation paradigm in a flexible querying context p. 39 A functional model for data analysis p. 51 Materialization-based range and k-nearest neighbor query processing algorithms p. 65 Flexible querying using structural and event based multimodal video data model p. 75 Reverse nearest neig...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006